Hierarchical Phrase Table Combination for Machine Translation
نویسندگان
چکیده
Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. In particular, we train a Bayesian phrasal inversion transduction grammars for each domain separately. The learned phrase tables are hierarchically combined as if they are drawn from a hierarchical Pitman-Yor process. The performance measured by BLEU is at least as comparable to the traditional batch training method. Furthermore, each phrase table is trained separately in each domain, and while computational overhead is significantly reduced by training them in parallel.
منابع مشابه
Reverse Word Order Models
In this work, we study the impact of the word order decoding direction for statistical machine translation (SMT). Both phrase-based and hierarchical phrasebased SMT systems are investigated by reversing the word order of the source and/or target language and comparing the translation results with the normal direction. Analysis are done on several components such as alignment model, language mod...
متن کاملDynamic Phrase Tables for Machine Translation in an Interactive Post-editing Scenario
This paper presents a phrase table implementation for theMoses system that computes phrase table entries for phrase-based statistical machine translation (PBSMT) on demand by sampling an indexed bitext. While this approach has been used for years in hierarchical phrase-based translation, the PBSMT community has been slow to adopt this paradigm, due to concerns that this would be slow and lead t...
متن کاملThe RWTH machine translation system for IWSLT 2007
The RWTH system for the IWSLT 2007 evaluation is a combination of several statistical machine translation systems. The combination includes Phrase-Based models, a n-gram translation model and a hierarchical phrase model. We describe the individual systems and the method that was used for combining the system outputs. Compared to our 2006 system, we newly introduce a hierarchical phrase-based tr...
متن کاملHierarchical Phrase-Based Translation with Suffix Arrays
A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrasebased translation introduces the added wrink...
متن کاملJane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
We present Jane 2, an open source toolkit supporting both the phrase-based and the hierarchical phrase-based paradigm for statistical machine translation. It is implemented in C++ and provides efficient decoding algorithms and data structures. This work focuses on the description of its phrase-based functionality. In addition to the standard pipeline, including phrase extraction and parameter o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013